7 research outputs found

    A latent rhythm complexity model for attribute-controlled drum pattern generation

    Get PDF
    AbstractMost music listeners have an intuitive understanding of the notion of rhythm complexity. Musicologists and scientists, however, have long sought objective ways to measure and model such a distinctively perceptual attribute of music. Whereas previous research has mainly focused on monophonic patterns, this article presents a novel perceptually-informed rhythm complexity measure specifically designed for polyphonic rhythms, i.e., patterns in which multiple simultaneous voices cooperate toward creating a coherent musical phrase. We focus on drum rhythms relating to the Western musical tradition and validate the proposed measure through a perceptual test where users were asked to rate the complexity of real-life drumming performances. Hence, we propose a latent vector model for rhythm complexity based on a recurrent variational autoencoder tasked with learning the complexity of input samples and embedding it along one latent dimension. Aided by an auxiliary adversarial loss term promoting disentanglement, this effectively regularizes the latent space, thus enabling explicit control over the complexity of newly generated patterns. Trained on a large corpus of MIDI files of polyphonic drum recordings, the proposed method proved capable of generating coherent and realistic samples at the desired complexity value. In our experiments, output and target complexities show a high correlation, and the latent space appears interpretable and continuously navigable. On the one hand, this model can readily contribute to a wide range of creative applications, including, for instance, assisted music composition and automatic music generation. On the other hand, it brings us one step closer toward achieving the ambitious goal of equipping machines with a human-like understanding of perceptual features of music

    A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

    Full text link
    Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network. Ensuring realistic conditions for music performance, however, constitutes a significant engineering challenge due to extremely strict requirements in terms of audio quality and, most importantly, network delay. To minimize the end-to-end delay experienced by the musicians, typical implementations of NMP applications use un-compressed, bidirectional audio streams and leverage UDP as transport protocol. Being connection less and unreliable,audio packets transmitted via UDP which become lost in transit are not re-transmitted and thus cause glitches in the receiver audio playout. This article describes a technique for predicting lost packet content in real-time using a deep learning approach. The ability of concealing errors in real time can help mitigate audio impairments caused by packet losses, thus improving the quality of audio playout in real-world scenarios.Comment: 8 pages, 2 figure

    Data-Driven Parameter Estimation of Lumped-Element Models via Automatic Differentiation

    Get PDF
    Lumped-element models (LEMs) provide a compact characterization of numerous real-world physical systems, including electrical, acoustic, and mechanical systems. However, even when the target topology is known, deriving model parameters that approximate a possibly distributed system often requires educated guesses or dedicated optimization routines. This article presents a general framework for the data-driven estimation of lumped parameters using automatic differentiation. Inspired by recent work on physical neural networks, we propose to explicitly embed a differentiable LEM in the forward pass of a learning algorithm and discover its parameters via backpropagation. The same approach could also be applied to blindly parameterize an approximating model that shares no isomorphism with the target system, for which it would be thus challenging to exploit prior knowledge of the underlying physics. We evaluate our framework on various linear and nonlinear systems, including time- and frequency-domain learning objectives, and consider real- and complex-valued differentiation strategies. In all our experiments, we were able to achieve a near-perfect match of the system state measurements and retrieve the true model parameters whenever possible. Besides its practical interest, the present approach provides a fully interpretable input-output mapping by exposing the topological structure of the underlying physical model, and it may therefore constitute an explainable ad-hoc alternative to otherwise black-box methods

    Zero-shot anomalous sound detection in domestic environments using large-scale pretrained audio pattern recognition models

    No full text
    Anomalous sound detection is central to audio-based surveillance and monitoring. In a domestic environment, however, the classes of sounds to be considered anomalous are situation-dependent and cannot be determined in advance. At the same time, it is not feasible to expect a demanding labeling effort from the end user. To address these problems, we present a novel zero-shot method relying on an auxiliary large-scale pretrained audio neural network in support of an unsupervised anomaly detector. The auxiliary module is tasked to generate a fingerprint for each sound occasionally registered by the user. These fingerprints are then compared with those extracted from the input audio stream, and the resulting similarity score is used to increase or reduce the sensitivity of the base detector. Experimental results on synthetic data show that the proposed method substantially improves upon the unsupervised base detector and is capable of outperforming existing few-shot learning systems developed for machine condition monitoring without involving additional training

    Feature projection-based unsupervised domain adaptation for acoustic scene classification

    No full text
    The mismatch between the data distributions of training and test data acquired under different recording conditions and using different devices is known to severely impair the performance of acoustic scene classification (ASC) systems. To address this issue, we propose an unsupervised domain adaptation method for ASC based on the projection of spectro-temporal features extracted from both the source and target domain onto the principal subspace spanned by the eigen-vectors of the sample covariance matrix of source-domain training data. Using the TUT Urban Acoustic Scenes 2018 Mobile Development dataset we show that the proposed method outperforms state-of-the-art unsupervised domain adaptation techniques when applied jointly with a convolutional ASC model and can also be practically employed as a feature extraction procedure for shallower artificial neural networks

    Virtual Bass Enhancement Via Music Demixing

    No full text
    Virtual Bass Enhancement (VBE) refers to a class of digital signal processing algorithms that aim at enhancing the perception of low frequencies in audio applications. Such algorithms typically exploit well-known psychoacoustic effects and are particularly valuable for improving the performance of small-size transducers often found in consumer electronics. Though both time- and frequency-domain techniques have been proposed in the literature, none of them capitalizes on the latest achievements of deep learning as far as music processing is concerned. In this letter, we propose a novel time-domain VBE algorithm that incorporates a deep neural network for music demixing as part of the processing pipeline. This technique is shown to improve the bass perception and reduce inharmonic distortion, i.e., the main issue of existing time-domain VBE algorithms. The results of a perceptual test are then presented, showing that the proposed method is able to outperform state-of-the-art algorithms both in terms of bass enhancement and basic audio quality

    Unsupervised domain adaptation via principal subspace projection for acoustic scene classification

    No full text
    Existing acoustic scene classification (ASC) systems often fail to generalize across different recording devices. In this work, we present an unsupervised domain adaptation method for ASC based on data standardization and feature projection. First, log-amplitude spectro-temporal features are standardized in a band-wise fashion over samples and time. Then, both source- and target-domain samples are projected onto the span of the principal eigenvectors of the covariance matrix of source-domain training data. The proposed method, being devised as a preprocessing procedure, is independent of the choice of the classification algorithm and can be readily applied to any ASC model at a minimal cost. Using the TUT Urban Acoustic Scenes 2018 Mobile Development dataset, we show that the proposed method can provide an absolute increment of over 10% compared to state-of-the-art unsupervised adaptation methods. Furthermore, the proposed method consistently outperforms a recent ASC model that ranked first in Task 1-A of the 2021 DCASE Challenge when evaluated on various unseen devices from the TAU Urban Acoustic Scenes 2020 Mobile Development dataset. In addition, our method appears robust even when provided with a small amount of target-domain data, proving effective using as few as 90 seconds of test audio recordings. Finally, we show that the proposed adaptation method can also be employed as a feature extraction stage for shallower neural networks, thus significantly reducing model complexity
    corecore